↑ TOC

Mathematical Statistics Learning Roadmap

📋 Table of Contents

1. Structured Learning Path

Phase 1: Mathematical Foundations (Weeks 1-10)

1.1 Advanced Linear Algebra

  • Vector spaces and linear transformations
  • Eigenvalues, eigenvectors, and matrix decompositions (SVD, QR, Cholesky)
  • Positive definite matrices and quadratic forms
  • Projection matrices and orthogonalization
  • Matrix calculus and derivatives
  • Tensor operations and multilinear algebra

1.2 Real Analysis Fundamentals

  • Limits, continuity, and sequences
  • Convergence concepts (pointwise, uniform, in probability, almost sure)
  • Open and closed sets, compactness, connectedness
  • Continuous functions and their properties
  • Fixed point theorems (Banach, Brouwer)
  • Differentiation and the Intermediate Value Theorem

1.3 Measure Theory Essentials

  • σ-algebras and measurable sets
  • Borel σ-algebras and Borel sets
  • Measure spaces and properties of measures
  • Lebesgue measure on ℝ
  • Measurable functions
  • Integration basics (Riemann vs. Lebesgue)

1.4 Probability Theory Foundations

  • Probability spaces and axioms
  • Events and probability measures
  • Independence of events
  • Conditional probability and Bayes' theorem
  • Elementary combinatorics and counting
  • First examples of random variables

Phase 2: Probability Theory (Weeks 11-22)

2.1 Random Variables & Distributions

  • Random variables as measurable functions
  • Cumulative distribution functions (CDFs)
  • Probability mass functions (PMFs) and probability density functions (PDFs)
  • Transformations of random variables
  • Joint, marginal, and conditional distributions
  • Independence of random variables
  • Order statistics and their distributions

2.2 Moments & Characteristic Functions

  • Expectation and variance (definitions and properties)
  • Moments and central moments
  • Higher moments: skewness, kurtosis
  • Covariance and correlation
  • Moment generating functions (MGFs)
  • Characteristic functions (Fourier transforms)
  • Cumulant generating functions

2.3 Convergence Theorems

  • Types of convergence: in distribution, in probability, almost surely, in Lp
  • Law of Large Numbers (weak and strong)
  • Central Limit Theorem and generalizations
  • Slutsky's theorem and continuous mapping theorem
  • Convergence of MGFs and characteristic functions
  • Delta method and Taylor expansions

2.4 Standard Probability Distributions

  • Discrete families: Bernoulli, Binomial, Poisson, Geometric, Hypergeometric
  • Continuous families: Normal, Exponential, Gamma, Beta, Uniform, Cauchy
  • Relationships between distributions
  • Limiting distributions and approximations
  • Multivariate distributions (multinomial, multivariate normal)
  • Compound distributions and mixtures

2.5 Dependence & Stochastic Processes

  • Copulas and measures of dependence
  • Markov chains and Markov properties
  • Random walks and martingales
  • Brownian motion and Wiener processes
  • Poisson processes
  • Introduction to stochastic calculus

Phase 3: Statistical Inference Foundations (Weeks 23-34)

3.1 Probability Sampling Theory

  • Sampling distributions for standard statistics
  • t-distribution, chi-square distribution, F-distribution
  • Sample mean, sample variance properties
  • Sampling from normal populations
  • Asymptotic distributions of sample statistics
  • Bootstrap and resampling distributions

3.2 Estimation Theory

  • Point estimation: definitions and concepts
  • Unbiased estimators and bias
  • Sufficiency and minimal sufficiency
  • Factorization theorem
  • Completeness and Basu's theorem
  • Information and Fisher information matrix

3.3 Properties of Good Estimators

  • Consistency and asymptotic normality
  • Efficiency and Cramér-Rao lower bound
  • Asymptotic efficiency and relative efficiency
  • Mean squared error and risk
  • Robustness and influence functions
  • Adaptive estimation

3.4 Methods of Estimation

  • Maximum Likelihood Estimation (MLE)
  • Properties of MLEs (consistency, asymptotic normality, efficiency)
  • Method of moments
  • Least squares estimation
  • M-estimation and robust estimation
  • Empirical likelihood

3.5 Interval Estimation

  • Confidence intervals: definition and properties
  • Construction via pivotal quantities
  • Confidence intervals for means, variances, proportions
  • Asymptotic confidence intervals
  • Bayesian credible intervals
  • Coverage probability and correctness

Phase 4: Hypothesis Testing & Advanced Inference (Weeks 35-46)

4.1 Hypothesis Testing Framework

  • Null and alternative hypotheses
  • Type I and Type II errors
  • Power and power functions
  • Likelihood ratio tests
  • Neyman-Pearson Lemma
  • Uniformly Most Powerful (UMP) tests

4.2 Standard Hypothesis Tests

  • Tests for means (one-sample, two-sample)
  • Tests for variances
  • Tests for proportions
  • Goodness-of-fit tests (χ², Kolmogorov-Smirnov, Anderson-Darling)
  • Independence and homogeneity tests
  • Non-parametric tests (Mann-Whitney, Wilcoxon, Kruskal-Wallis)

4.3 Multiple Testing & Optimality

  • Multiple comparisons problem
  • Bonferroni and Holm corrections
  • False discovery rate (FDR) control
  • Step-up and step-down procedures
  • Uniformly Most Powerful Unbiased (UMPU) tests
  • Invariance and group testing

4.4 Asymptotic Theory

  • Asymptotics of MLEs: consistency and asymptotic normality
  • Z-tests and asymptotic tests
  • Contiguity and LAN (Local Asymptotic Normality)
  • Efficiency in asymptotic sense
  • Non-regular models and rates of convergence
  • Empirical processes and weak convergence

4.5 Bayesian Inference

  • Prior distributions and elicitation
  • Posterior distributions and Bayes' theorem
  • Conjugate families
  • Credible intervals and Bayesian hypothesis tests
  • Loss functions and decision theory
  • Minimax, admissibility, and shrinkage

Phase 5: Advanced Statistical Theory (Weeks 47-56)

5.1 Decision Theory & Optimality

  • Decision problems and loss functions
  • Risk functions and comparison of procedures
  • Admissibility and completeness
  • Minimax procedures and minimax risk
  • Stein effect and shrinkage estimation
  • Admissibility in multivariate normal settings

5.2 Nonparametric & Semiparametric Methods

  • Nonparametric density estimation
  • Kernel methods and smoothing
  • Bandwidth selection and cross-validation
  • Semiparametric models and partial likelihood
  • U-statistics and V-statistics
  • Empirical likelihood and bootstrap

5.3 Large Sample Theory

  • Consistency under general conditions
  • Asymptotic normality and CLT variants
  • Rates of convergence and slow rates
  • Donsker's theorem and weak convergence
  • Empirical process theory
  • M-estimation asymptotic theory

5.4 High-Dimensional Statistics

  • The curse of dimensionality
  • Sparse recovery and compressed sensing
  • High-dimensional covariance estimation
  • Dimension reduction techniques
  • Penalized estimation (Lasso, adaptive Lasso)
  • Oracle inequalities and adaptation

5.5 Sampling & Order Statistics

  • Limit theorems for order statistics
  • Extreme value theory and tail behavior
  • Quantile estimation and processes
  • Record values
  • Truncated and censored distributions
  • Competing risks and multivariate survival

Phase 6: Specialized Advanced Topics (Weeks 57-64)

6.1 Causal Inference Theory

  • Potential outcomes framework
  • Rubin causal model
  • Causal effects and identifiability
  • Instrumental variables
  • Difference-in-differences and propensity scores
  • Sensitivity analysis and robustness

6.2 Statistical Learning Theory

  • VC dimension and Rademacher complexity
  • Generalization bounds and consistency
  • Regularization and empirical risk minimization
  • Statistical learning guarantees
  • PAC-learning framework
  • Uniform convergence rates

6.3 Information Theory in Statistics

  • Entropy and mutual information
  • Kullback-Leibler divergence
  • Divergence measures (Hellinger, Wasserstein, χ²)
  • Information inequalities
  • Renyi entropy and generalizations
  • Applications in hypothesis testing and coding

6.4 Bayesian Asymptotics

  • Posterior consistency and rates
  • Bernstein-von Mises theorem
  • Spike-and-slab priors and variable selection
  • Empirical Bayes and marginal likelihood
  • Laplace approximations
  • Variational Bayes theory

6.5 Advanced Estimation Theory

  • Efficient influence functions
  • Semiparametric efficiency bounds
  • Double robustness and debiased estimators
  • M-estimation and Z-estimation
  • Quasi-likelihood and sandwich estimators
  • Mediation analysis and path-specific effects

2. Major Algorithms, Techniques, and Tools

Core Theoretical Techniques

Technique Category Purpose Complexity
Maximum Likelihood Estimation Point Estimation General purpose estimation Medium
Method of Moments Point Estimation Simple estimation alternative Low
Least Squares Point Estimation Linear relationships Low-Medium
M-Estimation Robust Estimation Outlier-resistant inference High
Empirical Likelihood Nonparametric Distribution-free inference High
Likelihood Ratio Tests Hypothesis Testing Optimal testing framework Medium
Neyman-Pearson Lemma Hypothesis Testing Optimal test construction High
Stein Estimation Shrinkage Methods Variance reduction High
Jackknife Resampling Variance and bias estimation Medium
Bootstrap Resampling General inference method Medium

Asymptotic & Convergence Results

Result Type Application Scope
Law of Large Numbers Convergence Consistency of sample means
Central Limit Theorem Convergence Asymptotic distributions
Delta Method Convergence Functions of asymptotic normals
Cramér-Rao Lower Bound Optimality Efficiency bounds
Slutsky's Theorem Convergence Combining convergence results
Continuous Mapping Theorem Convergence Convergence preservation
LAN (Local Asymptotic Normality) Asymptotic Optimal rates theory
Bernstein-von Mises Bayesian Posterior asymptotics
Donsker's Theorem Weak Convergence Empirical processes

Mathematical Tools & Software

Proof & Theory Development:

  • LaTeX for mathematical typesetting
  • Overleaf for collaborative manuscript writing
  • Beamer for mathematical presentations
  • GitHub for version control of research
  • Arxiv for preprint distribution

Mathematical Computation:

  • Mathematica: Symbolic and numerical computation
  • Maple: Computer algebra system
  • Wolfram Language: Technical computing
  • SAGE: Open-source mathematics
  • SymPy (Python): Symbolic mathematics

Statistical Computation & Verification:

  • R: Statistical computing (base + ggplot2, tidyverse)
  • Python (NumPy, SciPy, Statsmodels): Scientific computing
  • MATLAB: Numerical computing
  • Julia: High-performance numerical computing
  • C++/Rcpp: High-speed computation

Data Analysis & Visualization:

  • R (ggplot2, lattice): Statistical graphics
  • Python (Matplotlib, Seaborn, Plotly): Visualization
  • TikZ: Publication-quality figures
  • Asymptote: Vector graphics language

Key Programming Frameworks

Framework Language Purpose Use Case
tidyverse R Data wrangling & analysis Applied work
ggplot2 R Visualization Graphics
Statsmodels Python Statistical modeling Regression, testing
SciPy.stats Python Distributions and tests Hypothesis testing
NumPy Python Numerical arrays Computation
Mathematica Wolfram Symbolic computation Proofs, derivations
Julia Julia Performance-critical Theory implementation

3. Cutting-Edge Developments in Mathematical Statistics

Recent Advances (2023-2025)

A. Modern High-Dimensional Theory

  • Exact phase transitions in compressed sensing and matrix recovery
  • Tensor methods and their statistical limits
  • Universality phenomena in random matrix theory
  • Algorithmic barriers and computational-statistical tradeoffs
  • Sum-of-squares methods and hierarchies of relaxations
  • Implicit regularization and implicit bias of gradient descent

B. Robust Statistics Revolution

  • Computationally efficient robust estimators with theoretical guarantees
  • Robust covariance estimation and high-dimensional robust methods
  • Adversarial robustness and certified robustness
  • Byzantine-robust distributed learning
  • Contamination models and breakdown points
  • Certified algorithms for robust inference

C. Causal Inference Theory Advances

  • Double/debiased machine learning with nonparametric nuisance parameters
  • Heterogeneous treatment effects (HTE) with rigorous theory
  • Local causal discovery and conditional independence structure
  • Causal effect bounds and partial identification
  • Time-varying treatments and dynamic regimes
  • Graphical causal models with hidden variables

D. Distribution-Free Inference

  • Conformal prediction and conformalized quantile regression
  • Valid inference without distributional assumptions
  • Predictive inference with guarantees
  • Sequential predictive inference
  • Nonparametric bootstrap improvements
  • Honest inference and sample splitting

E. Statistical Foundations of Deep Learning

  • Implicit regularization and generalization of neural networks
  • Overparametrization and interpolation regimes
  • Double descent phenomenon and test error curves
  • Neural network theory: kernel regimes and feature learning
  • Representation learning and feature dimension
  • Optimization-generalization tradeoffs in deep learning

F. Information-Theoretic Limits

  • Minimax rates for complex problems
  • Sample complexity and information-theoretic bounds
  • Fundamental limits of statistical problems
  • Threshold phenomena in estimation and testing
  • Optimal rates under constraints
  • Information-computation tradeoffs

G. Nonparametric Testing & Adaptation

  • Adaptive significance levels and multiple testing
  • Honest confidence intervals for nonparametric estimation
  • Isotonic regression and shape constraints
  • Testing goodness-of-fit in high dimensions
  • Nonparametric testing under fairness constraints
  • Distribution-free rank tests

H. Empirical Process Theory Extensions

  • High-dimensional empirical processes
  • Multiplier bootstrap and dependent data
  • U-process and V-process theory
  • Localized empirical process theory
  • Functional and infinite-dimensional extensions
  • Weak convergence in function spaces

I. Bayesian Theory & Practice Integration

  • Theoretical guarantees for Bayesian neural networks
  • Laplace approximations and their validity
  • Approximate Bayesian computation (ABC) with guarantees
  • Posterior concentration rates
  • Bayesian robustness and sensitivity analysis
  • Scalable posterior inference

J. Fairness & Bias in Statistics

  • Formal definitions of fairness from first principles
  • Statistical parity and calibration tradeoffs
  • Fairness-accuracy-interpretability triangles
  • Optimal fair classifiers with statistical theory
  • Discrimination testing and validation
  • Causal fairness and counterfactuals

4. Project Ideas: Beginner to Advanced

Beginner Projects (2-4 weeks)

Project 1: Probability Distribution Relationships

Create a comprehensive document illustrating relationships between standard distributions: limiting cases, special cases, transformations. Include derivations of key properties and verify with simulation.

Project 2: Convergence Visualization

Implement visualizations of Law of Large Numbers and Central Limit Theorem for different distributions and sample sizes. Show rates of convergence and illustrate concepts like "three-sigma" rule.

Project 3: Cramér-Rao Lower Bound Analysis

Derive Cramér-Rao lower bounds for standard families (Normal, Exponential, Poisson). Compare theoretical bounds with actual estimator variances via simulation.

Project 4: MLE Properties Exploration

Implement MLEs for common distributions and empirically verify consistency, asymptotic normality, and efficiency through simulation studies with varying sample sizes.

Project 5: Hypothesis Testing Power Analysis

Develop comprehensive power curves for standard tests (t-test, z-test, chi-square). Show how power depends on effect size, sample size, and significance level.

Intermediate Projects (4-8 weeks)

Project 6: Order Statistics Distribution Theory

Derive and verify distributions of order statistics for standard families. Compute expected values, variances, covariances. Create visualizations of joint distributions.

Project 7: Sufficiency & Factorization Theorem

Find minimal sufficient statistics for various probability families. Verify Basu's theorem relating sufficiency, completeness, and independence from ancillary statistics.

Project 8: Bootstrap vs. Parametric Inference Comparison

Compare bootstrap confidence intervals with standard parametric intervals across different distributions and statistics. Assess coverage properties and computational efficiency.

Project 9: Asymptotic Normality Under Misspecification

Investigate behavior of MLEs and M-estimators under model misspecification. Study sandwich estimators, influence functions, and robustness properties.

Project 10: Multiple Testing & FDR Control

Implement false discovery rate controlling procedures (Benjamini-Hochberg, step-up, step-down). Compare with Bonferroni in simulations. Assess power and FDR control.

Project 11: Nonparametric Density Estimation

Implement kernel density estimators with various kernels and bandwidth selectors. Study asymptotic properties, rates of convergence, and optimal smoothing.

Project 12: Extreme Value Theory Application

Analyze tail behavior using generalized extreme value and Pareto distributions. Estimate return periods, confidence intervals, and compare parametric/nonparametric methods.

Advanced Projects (8-16 weeks)

Project 13: Semiparametric Efficiency & Influence Functions

Derive influence functions for complex parameters in semiparametric models. Compute efficient influence functions and semiparametric efficiency bounds.

Project 14: Local Asymptotic Normality (LAN)

Develop LAN theory for a class of statistical models. Prove local asymptotic normality and derive asymptotic distributions of test statistics.

Project 15: High-Dimensional Covariance Estimation

Implement shrinkage estimators and regularized covariance estimators (Ledoit-Wolf, graphical lasso). Compare rates of convergence in high dimensions.

Project 16: Causal Inference with Doubly Robust Estimation

Develop theory and implementation for doubly robust estimators combining propensity scores and outcome regression. Analyze efficiency and robustness properties.

Project 17: Empirical Process Theory Application

Apply empirical process theory to derive uniform convergence rates for estimators. Compute VC dimension and Rademacher complexity bounds.

Project 18: Bayesian Asymptotics Study

Establish Bernstein-von Mises theorem for a specific model class. Study posterior concentration rates and Laplace approximations.

Project 19: Robust M-Estimation Theory

Derive asymptotic normality of M-estimators under general conditions. Study breakdown points, efficiency, and robustness properties.

Project 20: Stein Effect & Shrinkage Analysis

Prove the Stein phenomenon in multivariate normal estimation. Develop James-Stein estimators and verify superior risk properties theoretically and empirically.

Expert Projects (16+ weeks)

Project 21: Minimax Optimal Rates

Establish minimax rates for a complex statistical problem. Derive lower bounds via information theory and upper bounds through procedure construction.

Project 22: High-Dimensional Testing & Adaptation

Develop adaptive testing procedures for high-dimensional hypotheses. Prove optimal rates and adapt to unknown sparsity or smoothness.

Project 23: Compressed Sensing Phase Transitions

Analyze phase transitions in compressed sensing recovery. Study information-theoretic limits vs. algorithmic limits and the role of computational complexity.

Project 24: Fairness-Accuracy Tradeoffs

Formalize fairness constraints in statistical inference. Derive optimal fair classifiers and characterize fundamental tradeoffs between fairness and accuracy.

Project 25: Statistical Theory of Deep Learning

Develop theoretical analysis of neural network estimators. Study implicit regularization, double descent, generalization bounds, and overparametrization effects.

Project 26: Nonparametric Confidence Intervals

Construct honest confidence intervals for nonparametric functionals without parametric assumptions. Prove validity and optimality, handle nuisance parameters.

Project 27: Heterogeneous Treatment Effects Theory

Develop theoretical guarantees for HTE estimation under model misspecification. Analyze efficiency, adaptivity, and local complexity measures.

Project 28: Distribution-Free Inference & Conformal Prediction

Prove validity of conformal prediction and distribution-free methods. Establish optimality and tightness of predictive intervals.

Project 29: Information-Theoretic Foundations

Prove fundamental limits for a class of statistical problems using information theory. Apply channel coding, sphere packing, and Fano methods.

Project 30: Advanced Limit Theorems

Prove new limit theorems for dependent data, functional data, or complex structures. Include rates of convergence and refinements (edgeworth expansions, moderate deviations).

Learning Roadmap & Implementation

Phase Completion Criteria

Phase 1 Mastery:

  • Comfortable with proofs in linear algebra, real analysis, measure theory
  • Can work with σ-algebras and measurable functions confidently
  • Understand rigorous probability space formulation

Phase 2 Mastery:

  • Fluent with random variables, distributions, and convergence
  • Know characteristic functions and MGFs well
  • Understand Markov chains and martingales

Phase 3 Mastery:

  • Can derive sampling distributions from first principles
  • Understand sufficiency, completeness, and their implications
  • Know MLE properties and Fisher information theory

Phase 4 Mastery:

  • Can construct optimal tests using Neyman-Pearson lemma
  • Understand asymptotic theory of tests and estimators
  • Comfortable with Bayesian inference foundations

Phase 5 Mastery:

  • Understand decision theory and optimality criteria
  • Know nonparametric methods and their asymptotics
  • Familiar with high-dimensional phenomena

Phase 6 Mastery:

  • Can apply advanced theory to modern problems
  • Understand computational-statistical tradeoffs
  • Can read and understand recent research papers

Recommended Reading by Phase

Phase 1-2 Texts:

  • "Probability and Measure" by Billingsley (measure theory)
  • "A Course in Probability Theory" by Chung (comprehensive probability)
  • "Real and Stochastic Analysis" by Kallianpur (rigorous foundations)

Phase 3-4 Texts:

  • "Statistical Inference" by Casella & Berger (comprehensive)
  • "Testing Statistical Hypotheses" by Lehmann & Romano (hypothesis testing)
  • "Likelihood Methods in Statistics" by Pawitan (likelihood-based inference)

Phase 5-6 Texts:

  • "Asymptotic Statistics" by van der Vaart (modern asymptotics)
  • "Empirical Processes in M-Estimation" by van de Geer (M-estimation theory)
  • "The Elements of Statistical Learning" by Hastie, Tibshirani, Friedman (modern methods)
  • "High-Dimensional Statistics" by Wainwright (high-dimensional theory)

Advanced Theory:

  • "Weakly Differentiable Functions" by Evans & Gariepy (functional analysis)
  • "Information-Based Complexity" by Traub, Wasilkowski, Wozniakowski
  • "Confidence Intervals and Hypothesis Testing" by Proschan & Shaw (modern approaches)

Timeline & Pace

  • Months 1-3: Phase 1 (Foundations) - Mathematical maturity building
  • Months 4-6: Phase 2 (Probability) - Core probability theory
  • Months 7-9: Phase 3 (Inference Foundations) - Estimation theory
  • Months 10-12: Phase 4 (Hypothesis Testing) - Testing and advanced inference
  • Months 13-15: Phase 5 (Advanced Theory) - Decision theory and nonparametrics
  • Months 16-18: Phase 6 (Specialization) - Cutting-edge topics
  • Months 19-24: Deep dives and research projects

Mathematical Maturity Development

This roadmap assumes increasing mathematical sophistication:

  1. Early Phase: Learn to follow proofs and computational derivations
  2. Mid Phase: Can modify proofs and adapt arguments to new settings
  3. Late Phase: Can conjecture results and prove them independently
  4. Expert Phase: Can read research literature and contribute novel theory

Communities & Resources

Academic & Research Communities

  • Bernoulli Society for Mathematical Statistics and Probability
  • American Statistical Association Section on Nonparametric Statistics
  • Institute of Mathematical Statistics (IMS)
  • Statistical Society of Canada and other national societies
  • Cross Validated (Stack Exchange) for mathematical questions

Key Journals

  • Annals of Statistics (primary journal)
  • JASA (Journal of American Statistical Association)
  • Biometrika (foundational journal)
  • Electronic Journal of Statistics (open access)
  • Statistical Science (reviews and theory)
  • Probability Theory and Related Fields

Conferences & Seminars

  • Joint Statistical Meetings (JSM)
  • Bernoulli Society World Congress
  • SIAM Conference on Mathematics of Data Science
  • International Congress of Mathematical Statistics
  • University seminars and working groups

Preprints & Cutting-Edge Work

  • ArXiv (math.ST category)
  • bioRxiv, medRxiv (domain-specific preprints)
  • Conference proceedings (COLT, NeurIPS, ICML for learning theory)